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Abstract — We consider the problem of reconstructing a sig- 
nal from noisy measurements in linear mixing systems. The 
reconstruction performance is usually quantified by standard 
error metrics such as squared error, whereas we consider any 
additive error metric. Under the assumption that relaxed belief 
propagation (BP) can compute the posterior in the large system 
limit, we propose a simple, fast, and highly general algorithm 
that reconstructs the signal by minimizing the user-defined 
error metric. For two example metrics, we provide performance 
analysis and convincing numerical results. Finally, our algorithm 
can be adjusted to minimize the £oo error, which is not additive. 
Interestingly, £oo minimization only requires to apply a Wiener 
filter to the output of relaxed BP. 

I. Introduction 

A. Motivation 

Linear mixing systems are popular models used in many 
settings such as compressed sensing [4,5], regression [6,7], 
and multiuser detection [8]. In this paper, we consider the 
following linear system [4,5,8], 

w = *x, (1) 

where the entries of the system input vector x e R N are 
independent and identically distributed (i.i.d.), the random 
linear mixing matrix (or measurement matrix) $ € ^MxN j s 
sparse and known, and the measurement vector is w e M M . 
Because each component in w is a linear combination of 
the components in x, we call the system (1) a linear mixing 
system. 

The measurements w are passed through a bank of separable 
scalar channels characterized by conditional distributions, 

M 

/v|w(y|w) = ~[[fY\w(yi\wi), (2) 

i=l 

where y is the channel output vector, and (•), denotes the ith 
element of the corresponding vector. Note that the channels are 
general and are not restricted to Gaussian [9, 10]. Our goal is 
to reconstruct the original system input x from the channel 
output y and the measurement matrix 
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B. Reconstruction quality 

The performance of the estimation process is often char- 
acterized by some additive error metric that quantifies the 
distance between the estimated signal and the original signal. 
For a signal x and its estimate x, both of length N, the 
error between them is the summation over the component- 
wise errors, 

N 

D(x,x) =^2d(x j ,x 3 ). (3) 
i=i 

Squared error, i.e., d(xj,Xj) — \xj—Xj\ 2 in (3), is one of the 
most popular error metrics in various problems, due to many 
of its mathematical advantages. For example, minimum mean 
squared error (MMSE) estimation provides both variance and 
bias information about an estimator [11], and in the Gaussian 
case it is linear and thus often easy to implement [12]. 
However, there are applications where MMSE estimation is 
inappropriate, for example it is sensitive to outliers [13, 14]. 
Therefore, alternative error metrics, such as mean absolute 
error (median), mean cubic error, or Hamming distance, are 
used instead. Considering the significance of various types of 
error metrics other than squared error, a general estimation 
algorithm that can minimize any desired error metric is of 
interest. 

The error metric function (3) has an additive structure, 
and this particular structure makes it convenient to propose 
a scalar-estimation based algorithm. However, if we want to 
prevent any significant absolute errors, i.e., to minimize the 
^oo norm error, 

||x-x|| 00 = max \Sj — xA, (4) 

je{i,...,iv}' 

where the error is not additive, then is it possible to utilize the 
additive error metric (3) to approximate a minimum mean 
error estimator! 

C. Related work 

Squared error is most commonly used as the error metric 
in estimation problems given by (1) and (2). Mean-square 
optimal analysis and algorithms were introduced in [15-19] to 
estimate a signal from measurements corrupted by Gaussian 
noise; in [9, 10,20], further discussions were made about the 



circumstances where the output channel is arbitrary, while, 
again, the MMSE estimator was put forth. Another line of 
work, based on a greedy algorithm called orthogonal matching 
pursuit, was presented in [21,22] where the mean squared 
error decreases over iterations. Absolute error is also under 
intense study in signal estimation. For example, an efficient 
sparse recovery scheme that minimizes the absolute error was 
provided in [23,24]; in [25], a fundamental analysis was 
offered on the minimal number of measurements required 
while keeping the estimation error within a certain range, 
and absolute error was one of the metrics concerned. Support 
recovery error is another metric of importance, for example 
it relates to properties of the measurement matrices [26]. The 
authors of [26-28] discussed the support error rate when re- 
covering a sparse signal from its noisy measurements; support- 
related performance metrics were applied in the derivations of 
theoretical limits on the sampling rate for signal recovery [29, 
30]. 

There have also been a number of studies on general 
properties of error related solutions. An overdetermined 
linear system y = <&x, where <I> 6 R NxM and N > M, was 
considered by Cadzow [31], and the properties of the minimum 
£oo error solutions to this system was explored. In Clark [32], 
the author developed a way to calculate the distribution of the 
greatest element in a finite set of random variables. And in 
Indyk [33], an algorithm was introduced to find the nearest 
neighbor of a point while £oo norm distance was considered. 
Finally, Lounici [34] studied the error convergence rate for 
Lasso and Dantzig estimators. 

D. Contributions 

In this paper, we review our recent work on signal estima- 
tion with arbitrary additive error metrics [1,2], and discuss its 
extension to minimizing the 4» norm error. Our contributions 
are: (/) we suggest a Bayesian estimation algorithm that mini- 
mizes an arbitrary additive error metric; (if) we prove that the 
algorithm is optimal, and study the fundamental information- 
theoretic performance limit of estimation for a given metric; 
and (Hi) we derive the performance limits for minimum mean 
absolute error and minimum mean support error. 

The proposed metric-optimal algorithm is based on the as- 
sumption that the relaxed belief propagation (BP) method [10] 
converges to a set of degraded scalar Gaussian channels [9, 15, 
17, 18]. The relaxed BP method is well-known to be optimal 
for the squared error, while we further show that the relaxed 
BP method can do more - because the sufficient statistics are 
given, other additive error metrics can also be minimized with 
one extra step, which is simple and fast. This is convenient 
for users who desire to recover the original signal with a non- 
standard additive error metric. 

Another contribution that is included in this paper is an 
application of the metric -optimal algorithm to the norm 
error, where the input signal is sparse Gaussian distributed, 
meaning that with probability s the vector entries are standard 
Gaussian, and with probability (1 — s) they are zero. We 
find [3] that, in the large system limit, the minimum mean 




Fig. 1: Tanner graph for relaxed belief propagation. 

estimator is a linear function of the output of the relaxed belief 
propagation (BP) method. Moreover, based on the metric- 
optimal algorithm, we propose a heuristic that achieves a 
lower error than the linear function does when the signal 
dimension is finite. 

The remainder of the paper is arranged as follows. We 
review the relaxed BP method in Section II. Then we describe 
our proposed algorithm, discuss its performance, and provide 
several example error metrics in Section III. We further 
discuss the possibility of extending the proposed algorithm 
for norm error in Section IV. Simulation results appear in 
Section V. 

II. Review of Relaxed Belief Propagation 

Belief Propagation (BP) [35] is an iterative method used 
to compute the marginal probabilities of a Bayesian network. 
Consider the bipartite graph, called a Tanner or factor graph, 
shown in Figure 1, where circles represent random variables 
(called variable nodes), and related variables are connected 
through functions (represented by squares, called factor nodes 
or function nodes) that indicate their Bayesian dependence. In 
standard BP, there are two types of messages passed through 
the nodes: messages from variable nodes to factor nodes, 
m x ^ y , and messages from factor nodes to variable nodes, 
m y ^ x . If we denote the set of function nodes connected to 
the variable x by N(x), the set of variable nodes connected to 
the function y by N(y), and the factor function at node y by 
tyy, then the two types of messages are defined as follows [35]: 

Wins = Yl m k^x, 

k£N(x)\y 

m y -y x = ^ ^v m i-yy> 

leN(y)\x 

where A \ B denotes the set whose elements are in A but not 
in B. 

Inspired by the basic BP idea described above, the authors 
of [9, 16] developed iterative algorithms for estimation prob- 



lems in linear mixing systems. In the Tanner graph, an input 
vector x = [x\, x 2 , xjy] is associated with the variable 
nodes (input nodes), and the output vector y — [yi, y 2 , j/m] 
is associated with the function nodes (output nodes). If <f>ij 7^ 
0, then nodes Xj and yt are connected by an edge (i, j), where 
the set of such edges E is defined as E = : $ij ^ 0}. 

In standard BP methods [8,36,37], the distribution func- 
tions of xj and tOj as well as the channel distribution function 
fY\w(yi\ w i) were set to be the messages passed along the 
graph, but it is difficult to compute those distributions, making 
the standard BP method computationally expensive. In [16], 
a simplified algorithm, called relaxed BP, was suggested. In 
this algorithm, means and variances replace the distribution 
functions themselves and serve as the messages passed through 
nodes in the Tanner graph, greatly reducing the computational 
complexity. In [9,10,20,38], this method was extended to 
a more general case where the channel is not necessarily 
Gaussian. 

In Rangan [10], the relaxed BP algorithm generates two 
sequences, qj(t) and Hj{t), where teZ + denotes the iteration 
number. Under the assumptions that the signal dimension 
N — > 00 and the iteration number t — > 00, while the 
ratio M/N is fixed, the sequences qj(t) and fJ,j(t) converge 
to sufficient statistics for the linear mixing channel obser- 
vation y (2). More specifically, the conditional distribution 
f(xj\qj(t),/j,j(t)) converges to the conditional distribution 
f(xj\y), where qj(t) can be regarded as a Gaussian-noise- 
corrupted version of xj, and fij{t) is the noise variance, 



q j (t)=x j +v j (t), 



(5) 



where Vj(t) — W(0, fJ,j(t)) for j = 1, 2, . . . , N. It has been 
shown [10] that /J.j(t) converges to a fixed point that satisfies 
Tanaka's equation, which is analyzed in detail in [15, 17, 18, 
20,37,39,40]. We define the limits of the sequences, 

lim q 3 (t) = qj, 

t— >oo 

lim Vj {t) = vj, 

t—too 

lim fj,j(t) = fi, 

t— >oo 

for j = 1,2, ... ,N. Note that all the scalar Gaussian chan- 
nels (5) converge to the same noise variance /1. We now 
simplify equation (5) as, 



Qi = x j + v 



3 ' 



(6) 



where Vj ~ 7V(0, fi) for j = 1, 2, . . . , N. 



III. Estimation Algorithm for Additive Errors 
A. Algorithm 

In this paper, we utilize the outputs of the relaxed BP 
method by Rangan [10], in particular the software package 
"GAMP" [41]. Figure 2 illustrates the structure of our metric- 
optimal estimation algorithm (dashed box). The inputs of the 
algorithm are: (i) a distribution function ,/x( x ), the prior of the 
original input x; (;;) a vector q = [qi, q 2 , gjv], the outputs 
of the scalar Gaussian channels computed by relaxed BP [10]; 
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Fig. 2: The structure of the metric-optimal estimation algorithm. 

(Hi) a scalar fi, the variance of the Gaussian noise in (6); and 
(iv) an additive error metric function D(x, x) (3) specified by 
the user. 

Because the scalar channels have additive Gaussian noise, 
and the variances of the noise are all fi, we can compute 
the conditional probability density function ./x|Q( x |q) from 
B ayes' rule: 



/x|q(x|q) = 



/q|x(q|*)/x(x) 

/q(q) 

/q|x(q|x)/x(x) 
J/q|x(q|x)/x(x)dx' 



(7) 



where 



/q|x(q|x) 



V(27T/i) 



exp 



iq-xiia 

2/i 



Given an error metric D(x, x), the optimal estimand x opt is 
generated by minimizing the conditional expectation of the 
error metric E[D(x,x)\q\, which is easy to compute using 
/x|q(x|q): 

E[D(%x)\q} = J £>(x,x)/x|Q(x|q)dx. 



Then, 



x opt = arg min E[D(x,x)\ q] 

X 

= argmin J D(% x)/ X |q(x|q)rix. (8) 

The conditional probability .fx|q( x |q) is separable, because 
the parallel scalar Gaussian channels (6) are separable and 
.fx(x) is i.i.d. Moreover, the error metric function D(x, x) (3) 
is also separable. Therefore, the problem reduces to scalar 
estimation [12], 

a? opt j- = arg min E [d(xj ,Xj)\ qj } 

Xj 

= argmin / d(xj,Xj)f x , q (xj\qj)dXj, (9) 

x o J 

for j = 1,2, ... ,N. Equation (9) minimizes a single-variable 
function. We have shown in detail [1,2] how to perform this 
minimization in two example error metrics, and summarize 
the results in Section III-C. 



MMUE(/ X ,M) = / / £>(x opt ,x) ( -=L== exp (- l|q X| H ) / x (x)rfxrfq. (10) 

/+OO / /* ,MMAE /- + 0O \ 

/ {-Xj)fx j \Q j {x j \qj)dXj+ I .r,J\ Q i.r, qj)dXj (<7j)<%. (H) 



B. Theoretical results 

Having described the algorithm, we now discuss its theo- 
retical properties based on the replica method. Because the 
replica method has not been rigorously justified, our results 
are given as claims. More detailed discussions of the claims 
can be found in Tan et al. [2]. 

Claim 1: Given the system model (1), (2) and an error 
metric Z?(x, x) of additive form (3), as the signal dimension 
N — > 00 and the measurement ratio M/N is fixed, the optimal 
estimand of the input signal is given by 

x opt = arg min E [£>(x, x) |q] , 

X 

where the vector entries q = (qi, q 2 , . . . , qn) are the outputs 
of the scalar Gaussian channels (6). 

As we mentioned in Section II, the probability density func- 
tion f Xj \-Y(xj\y) is statistically equivalent to fx^Q^x^qj) 
in the large system limit. Under this assumption, once we 
know the value of /1, estimating each Xj from all channel 
outputs y = (yi,U2, ■■■,Vm) is equivalent to estimating Xj 
from the corresponding scalar channel output qj. The relaxed 
BP method [10] calculates the sufficient statistics qj and 
fi. Therefore, an estimator based on minimizing the condi- 
tional expectation of the error metric, E (-D(x, x)|q), gives an 
asymptotically optimal result. 

Claim 2: With the optimal estimand x opt determined by 
(8), the minimum mean user-defined error (MMUE) is given 
by (10), where R(-) represents the range of a variable, and p, 
is the variance of the noise of the scalar Gaussian channel (6). 



C. Examples 

We include examples that have been worked out in detail 
in Tan et al. [1,2]. 

1) Absolute error: Because the MMSE is the mean of the 
conditional distribution, the outliers in the set of data may cor- 
rupt the estimation, and in this case the minimum mean abso- 
lute error (MMAE) could be a good alternative. The minimum 
mean absolute error is given by (11), where Xi (respectively, 
qi) is the input (respectively, output) of the scalar Gaussian 
channel (6), x ijMM AE is such that J^°° fxAQ,{ x Mi) dx i = 

' " , MMAE 

\, and f Xi \Qi(xi\qi) is a function of fx^Xi) following (7). 

2) Support recovery error: In some applications, correctly 
estimating the locations where the data has non-zero values 
is almost as important as estimating the exact values of the 
data; it is a standard model selection error criterion [26]. The 
process of estimating the non-zero locations is called support 



recovery. Support recovery error is defined as follows, and this 
metric function is discrete, 

^support {Xj ,Xj ) — XOr(Xj ,Xj), 

where xor is the standard logical operator. 

We have shown that the minimum mean support error 
(MMSuE) estimator achieves 



MMSuE(/ x , ^) = N(l - s) ■ erfc 



+ Ns ■ erf 



2(a 2 + /i) 



(12) 



where 



T = 2 ■ g2+ ^ l n I (1 ~ s )V^Vm + 1 



a 2 I n V s / 
IV. The Minimum Mean 4o Error Estimator 

Having discussed the metric-optimal algorithm, which was 
designed for additive error metrics, we now discuss how to 
extend the algorithm to support error. 

The error only considers the component with greatest 
absolute value (4), and the minimum mean error estimator 
is defined as 



Xoo = arg min E [||x - xH^lq] 



(13) 



For a scalar Gaussian channel (6), we know that the mini- 
mum mean squared error estimator is achieved by conditional 
expectation. If Xj ~ Af(0, 1), then it is the estimator jq^q 
that gives the minimum mean squared error, where [i is the 
variance of the noise in the scalar Gaussian channels (6). 
This format i s regarded as the Wiener filter in signal 

processing [42]. Our ongoing work [3] indicates that, in the 
large system limit, the Wiener filter is also optimal for 
error when the input signal is sparse Gaussian. 

Claim 3: For a set of parallel scalar Gaussian channels (6), 
where the input signal is sparse Gaussian distributed, the 

l N (13) is linear 
00, i.e., 



minimum mean error estimator e 
in the channel observation vector q as N - 



lim x 



q- 



(14) 



The detailed proof of Claim 3 is in preparation [3]. 

Unfortunately, the Wiener filtering (14) is only optimal for 
too error in the large system limit, and when the signal di- 
mension is finite, our metric-optimal algorithm in conjunction 



with a specific error metric outperforms the Wiener filtering 
in terms of error. 

Let us see what this specific error metric is. Recall that the 
definition of the £ v norm error between x and x is 



l/p 



5^ \ x j x j\ P 
\je{i,...,N} 

and it can be shown that 

lim ||x- x|| p = ||x- x||oo. 

p— > +00 

It is therefore natural to turn to the £ p norm error as an 
alternative. 

The £ p error is closely related to our definition of an additive 
error metric (3). We define 



N 



D p (x,x) = ]r 



Xi - Xo- r — x - x 



and let x p denote the estimand that minimizes the conditional 
expectation of D p (Sc,x), i.e., 



and 



arg min E [D p (x, x) | q] 

X 

argrnin£ , [||x-x||P|q], 

X 

argmin_E[|x_ ? - — ^j| p |<7i]) 



(15) 



for j e {1,2, ...,7V}. 

The numerical results in Section V show that, for some 
values of p, the estimator x p (15) achieves a lower error 
than the Wiener filter (14). 

V. Numerical results 

To illustrate the performance of our algorithms, we provide 
some numerical results in this section, for both additive error 
metrics and the 1^ norm error. The Matlab implementation 
of our algorithm can be found at http://people.engr.ncsu. 
edu/dzbaron/software/arb_metric/. It automatically computes 
equations (7)-(9) where the additive error metric (3) is given 
as the input of the algorithm. 

A. Additive error metrics 

We test our estimation algorithm on two cases modeled by 
(1) and (2): (z) sparse Gaussian input and Gaussian channel; 
(if) sparse Weibull (defined below) input and Poisson channel. 
In both cases, the input's length N is 10,000, and its sparsity 
rate is s = 3%, meaning that the entries of the input 
vector are non-zero with probability 3%, and zero otherwise. 
The matrix $ we use is Bernoulli(0.5) distributed, and is 
normalized to have unit-norm rows. In the first case, the non- 
zero input entries are Af(0, 1) distributed, and the Gaussian 
noise is Af(0, 3-10 -4 ) distributed, i.e., the signal-to-noise ratio 
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Fig. 3: Comparison of the metric-optimal estimation algorithm, re- 
laxed BP, and CoSaMP. (Sparse Gaussian input and Gaussian channel; 
sparsity rate s = 3%; input length N=10,000; SNR=20dB.) 



(SNR) is 20dB. In the second case, the non-zero input entries 
are Weibull distributed, 



f(xj;X,k) = 



> 



Xj < 



where A = 1 and k = 0.5. The Poisson channel is 



fy\w(yi\wi 



Vi 



*e{l,2,...,M} ) 



where the scaling factor of the input is a = 100. 

In order to illustrate that our estimation algorithm is suitable 
for reasonable error metrics, we considered absolute error and 
two other non-standard metrics: 



Error,, 



JV 

Ei 



where p = 0.5 or 1.5. 

We compare our algorithm with the relaxed BP [10] and 
CoSaMP [22] algorithms. In Figure 3 and Figure 4, lines 
marked with "metric-optimal" present the errors of our es- 
timation algorithm, and lines marked with "Relaxed BP" 
(respectively, "CoSaMP") show the errors of the relaxed BP 
(respectively, CoSaMP) algorithm. Each point in the figure 
is an average of 100 experiments with the same parameters. 
Because the Poisson channel is not an additive noise channel 
and is not suitable for CoSaMP, the "MAE" and the "Error 1 .5" 
lines for "CoSaMP" in Figure 4 appear beyond the scope of the 
vertical axis. It can be seen that our metric -optimal algorithm 
outperforms the two other methods. 

To demonstrate the theoretical analysis of our algorithm in 
Section III-C, we compare our MMAE estimation results with 
the theoretical limit (11) in Figure 5 a, where the integrations 
are computed numerically. In Figure 5b, we compare our 
MMSuE estimator with the theoretical limit (12), where the 
value of [i is acquired numerically from the relaxed BP 
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Fig. 4: Comparison of the metric-optimal estimation algorithm, 
relaxed BP, and CoSaMP. The "MAE" and the "Errori.5" lines for 
"CoSaMP" appear beyond the scope of the vertical axis. (Sparse 
Weibull input and Poisson channel; sparsity rate s = 3%; input length 
N=10,000; input scaling factor a = 100.) 
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Fig. 6: The performance of the Wiener filter as well as the estima- 
tors X5, xio, and X15 in terms of loo error. (Sparse Gaussian input 
and Gaussian channel; sparsity rate s = 5%; measurement ratio 
M/N = 0.3; SNR=20dB.) 
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Fig. 5: Comparisons of the metric-optimal estimators and the cor- 
responding theoretical limits (1 1), and (12). The corresponding lines 
coincide. (Sparse Gaussian input and Gaussian channel; sparsity 
rate s = 3%; input length N=10,000; SNR=20dB.) 



method [41] with 20 iterations. In these two figures, each 
point on the "metric-optimal" line is generated by averaging 
40 experiments with the same parameters. It is shown in 
both figures that the lines coincide. Therefore our estimation 
algorithm reaches the corresponding theoretical limits and is 
optimal. 



B. The norm error 

To show the performance of the Wiener filter and our 
metric-optimal algorithm in terms of the l^ error, we test the 
Wiener filter (14) as well as the estimator (15). The simulation 
settings remain the same with the previous simulations, except 
that we set the sparsity rate s — 5%, and we only test the 
sparse Gaussian signal. The ratio M/N is fixed to 0.3, and 
the signal dimension N ranges from 500 to 20, 000. We then 
run our metric-optimal algorithm with p = 5, 10, 15 in (15). 
In Figure 6, we present the l^ norm error of the Wiener 
filter (14), along with the l^ norm error of X5, Xio, and 
X15 (15). It can be seen that the estimators Xg, Xio, and Xig 
all achieve lower errors than the Wiener filter (13) does, 
because the signal dimension is finite. Thus, for a finite signal 
dimension, it is reasonable to choose a proper value of p, 
and run the metric-optimal algorithm for x p (15) to achieve 
a low loo error. That said, these results are specific to sparse 
Gaussian inputs, and it remains to be seen whether they will 
carry over to other input distributions. 
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